Probabilistic Size-constrained Microclustering

نویسندگان

  • Arto Klami
  • Aditya Jitta
چکیده

Microclustering refers to clustering models that produce small clusters or, equivalently, to models where the size of the clusters grows sublinearly with the number of samples. We formulate probabilistic microclustering models by assigning a prior distribution on the size of the clusters, and in particular consider microclustering models with explicit bounds on the size of the clusters. The combinatorial constraints make full Bayesian inference complicated, but we manage to develop a Gibbs sampling algorithm that can efficiently sample from the joint cluster allocation of all data points. We empirically demonstrate the computational efficiency of the algorithm for problem instances of varying difficulty.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Controlling the Size of Clusters in Probabilistic Clustering

Classical model-based partitional clustering algorithms, such as k-means or mixture of Gaussians, provide only loose and indirect control over the size of the resulting clusters. In this work, we present a family of probabilistic clustering models that can be steered towards clusters of desired size by providing a prior distribution over the possible sizes, allowing the analyst to fine-tune exp...

متن کامل

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman–Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some tasks, this assumption is undesirable. For exam...

متن کامل

Flexible Models for Microclustering with Application to Entity Resolution

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman–Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate....

متن کامل

Reliability-Constrained Unit Commitment Considering Interruptible Load Participation

From the optimization point of view, an optimum solution of the unit commitment problem with reliability constraints can be achieved when all constraints are simultaneously satisfied rather than sequentially or separately satisfying them. Therefore, the reliability constraints need to be appropriately formulated in terms of the conventional unit commitment variables. In this paper, the reli...

متن کامل

A UNIFIED MODEL FOR RESOURCE-CONSTRAINED PROJECT SCHEDULING PROBLEM WITH UNCERTAIN ACTIVITY DURATIONS

In this paper we present a unified (probabilistic/possibilistic) model for resource-constrained project scheduling problem (RCPSP) with uncertain activity durations and a concept of a heuristic approach connected to the theoretical model. It is shown that the uncertainty management can be built into any heuristic algorithm developed to solve RCPSP with deterministic activity durations. The esse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016